{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 第9章-EM算法及推广-高斯混合模型\n",
    "\n",
    "&emsp;&emsp;本节介绍如何用EM算法求解高斯混合模型中的参数，高斯混合模型是一个非常重要的模型，从理论上讲，可以用高斯混合模型逼近任何一个连续型的分布，这就类似于可以通过泰勒公式（泰勒展开）用一个多项式拟合任何一个函数。  \n",
    "\n",
    "### 明确隐变量，写出完全数据的对数似然函数\n",
    "&emsp;&emsp;高斯混合模型的概率分布模型为$\\displaystyle P(y|\\theta)=\\sum_{k=1}^K \\alpha_k \\phi(y|\\theta_k)$，其中$\\phi(y|\\theta_k)$是高斯分布密度函数，是由$\\theta_k$决定的，$\\alpha_k$表示$y$以$\\alpha_k$的概率来自于第$k$个正态分布$\\phi(y|\\theta_k)$，$\\theta=(\\alpha_1,\\cdots, \\alpha_K,\\mu_1,\\sigma_1^2,\\mu_2,\\sigma_2^2,\\cdots,\\mu_K,\\sigma_K^2)$，也就是说，要通过观测数据$y$估计$\\theta$中的值。  \n",
    "&emsp;&emsp;根据EM算法，存在一个隐变量$\\gamma$，$\\gamma$表示当前的$y$来自的高斯分布，对于第一个观测值，有$\\gamma_1=(\\gamma_{11},\\gamma_{12},\\cdots,\\gamma_{1K})$，其中根据书中的$\\gamma_{jk}$的定义：$$\\gamma_{jk} = \\left\\{ \n",
    "\\begin{array}{ll} \n",
    "1, & 第j个观测来自第k个分模型 \\\\\n",
    "0, & 否则\n",
    "\\end{array}\\right. \\\\ \n",
    "j = 1,2,\\cdots, N; k=1, 2,\\cdots,K$$以上是随机变量的分布，取第1个值的概率为$\\alpha_1$，取第2个值的概率为$\\alpha_2$，……，取第K个值的概率为$\\alpha_K$，一旦知道$\\gamma_1$的值，就知道从第几个高斯分布中抽取$y_1$。  \n",
    "$$\\begin{aligned} p(\\gamma_1,y_1|\\theta)\n",
    "&= p(\\gamma_1|\\theta) \\cdot p(y_1 | \\gamma_1,\\theta) \\\\\n",
    "&= \\alpha^{\\gamma_{11}}_1 \\cdot \\alpha^{\\gamma_{12}}_2 \\cdots \\alpha^{\\gamma_{1K}}_K \\phi(y_1|\\theta_1)^{\\gamma_{11}} \\phi(y_2|\\theta_2)^{\\gamma_{12}} \\cdots \\phi(y_1|\\theta_K)^{\\gamma_{1K}} \\\\\n",
    "=& \\prod_{k=1}^K \\left[ \\alpha_k \\phi(y_1 | \\theta_k) \\right]^{\\gamma_{1k}}\n",
    "\\end{aligned}$$这个是第1个样本点完全数据的密度函数。在极大化似然估计中是极大化似然函数，这需要所有样本点的联合分布，对于所有的样本点，概率密度函数为\n",
    "$$P(y,\\gamma|\\theta)=\\prod_{j=1}^N \\prod_{k=1}^K \\left[ \\alpha_k \\phi(y_j | \\theta_k) \\right]^{\\gamma_{jk}}$$  \n",
    "$\\displaystyle \\because \\prod_{j=1}^N \\prod_{k=1}^K \\alpha_k^{\\gamma_{jk}} = \\prod_{k=1}^K \\alpha_k^{\\sum_{j=1}^N \\gamma_{jk}}$，$\\displaystyle \\sum_{j=1}^N \\gamma_{jk}$表示在N个样本点中，一共有多少个是来自第$k$个高斯分布的，将该数量记为$\\displaystyle n_k=\\sum_{j=1}^N \\gamma_{jk}$，$n_1+n_2+\\cdots+n_K=N$  \n",
    "$\\displaystyle \\therefore \\prod_{k=1}^K \\alpha_k^{\\sum_{j=1}^N \\gamma_{jk}} = \\prod_{k=1}^K \\alpha_k^{n_k}$  \n",
    "$\\displaystyle \\therefore \\prod_{j=1}^N \\prod_{k=1}^K \\alpha_k^{\\gamma_{jk}} = \\prod_{k=1}^K \\alpha_k^{n_k}$  \n",
    "$\\displaystyle \\therefore P(y, \\gamma|\\theta) = \\prod_{k=1}^K \\alpha_k^{n_k} \\prod_{j=1}^N \\big[\\phi(y_i|\\theta_k)\\big]^{\\gamma_{jk}} = \\prod_{k=1}^K \\alpha_k^{n_k} \\cdot \\prod_{j=1}^N \\left[ \\frac{1}{\\sqrt{2\\pi} \\sigma_k} \\exp\\left(-\\frac{(y_j-\\mu_k)^2}{2 \\sigma_k^2} \\right) \\right]^{\\gamma_{jk}}$  \n",
    "$\\displaystyle \\therefore \\ln P(y, \\gamma|\\theta) = \\sum_{k=1}^K \\left\\{ n_k \\ln \\alpha_k + \\sum_{j=1}^N \\gamma_{jk} \\left[ \\ln (\\frac{1}{\\sqrt{2\\pi}}) - \\ln \\sigma_k - \\frac{1}{2 \\sigma_k^2} (y_j - \\mu_k)^2 \\right] \\right\\} $  \n",
    "\n",
    "### EM算法的E步，确定$Q$函数\n",
    "将隐变量都换成期望，隐变量有$\\gamma_{jk}$和$n_k$  \n",
    "$\\displaystyle \\because E(n_k) = E \\left(\\sum_j \\gamma_{jk} \\right) = \\sum_j E(\\gamma_{jk})，E(\\gamma_{jk} | \\theta^{(i)},y) = P(\\gamma_{jk}=1| \\theta^{(i)},y)$，求解期望时，是根据上一步的$\\theta^{(i)}$以及观测数据所有的$y_j$，需要知道$\\gamma_{jk}$的分布$P(\\gamma_{jk}=1|\\theta^{(i)},y)$。  \n",
    "$\\begin{aligned} \\because P(\\gamma_{jk}=1|\\theta^{(i)},y)\n",
    "=& \\frac{P(\\gamma_{jk}=1,y_j | \\theta^{(i)})}{P(y_j|\\theta^{(i)})} \\\\\n",
    "=& \\frac{P(\\gamma_{jk}=1,y_j | \\theta^{(i)})}{\\displaystyle \\sum_{k=1}^K P(\\gamma_{jk}=1,y_j | \\theta^{(i)})} \\\\\n",
    "=& \\frac{P(\\gamma_{jk}=1|\\theta^{(i)})P(y_i|\\gamma_{jk}=1, \\theta^{(i)})}{\\displaystyle \\sum_{k=1}^K P(y_j | \\gamma_{jk}=1,\\theta^{(i)}) P(\\gamma_{jk}=1|\\theta^{(i)})}\n",
    "\\end{aligned}$  \n",
    "$\\because \\alpha_k=P(\\gamma_{jk}=1|\\theta), \\phi(y_i|\\theta)=P(y_i | \\gamma_{jk}=1,\\theta)$  \n",
    "$\\displaystyle \\therefore E(\\gamma_{jk} | y, \\theta^{(i)}) = P(\\gamma_{jk}=1|\\theta^{(i)},y) = \\frac{\\alpha_k \\phi(y_i|\\theta^{(i)})}{\\displaystyle \\sum_{k=1}^K \\alpha_k \\phi(y_i|\\theta^{(i)})}$，其中$\\theta^{(i)}=(\\alpha_k^{(i)}, \\theta_k^{(i)})$  \n",
    "&emsp;&emsp;将$\\gamma_{jk}$关于给定$y$和$\\theta^{(i)}$的条件下的期望记为$Z_k=E(\\gamma_{jk} | y, \\theta^{(i)})$，因为各个样本之间是独立同分布的，所以$Z_k$是和$j$无关的。  \n",
    "$\\displaystyle \\therefore Q(\\theta, \\theta^{(i)}) = E_Z \\big[ln P(y,\\gamma | \\theta^{(i)})\\big] = \\sum_{k=1}^K \\left\\{ (N Z_k) \\ln \\alpha_k + Z_k \\sum_{j=1}^N \\left[ \\ln (\\frac{1}{\\sqrt{2\\pi}}) - \\ln \\sigma_k - \\frac{1}{2 \\sigma_k^2} (y_j - \\mu_k)^2 \\right] \\right\\}$  \n",
    "\n",
    "### 确定EM算法的M步\n",
    "需要估计的变量有$\\alpha_k,\\sigma_k,\\mu_k$，然后求偏导等于0：$$\\begin{array}{l} \n",
    "\\displaystyle \\frac{\\partial Q(\\theta, \\theta^{(i)})}{\\partial \\mu_k} = 0 \\\\\n",
    "\\displaystyle \\frac{\\partial Q(\\theta, \\theta^{(i)})}{\\partial \\sigma_k^2} = 0 \\\\\n",
    "\\left \\{ \\begin{array}{l} \n",
    "\\displaystyle \\frac{\\partial Q(\\theta, \\theta^{(i)})}{\\partial \\alpha_k} = 0 \\\\\n",
    "\\sum \\alpha_k = 1 \n",
    "\\end{array} \\right .\n",
    "\\end{array}$$根据上述公式可以推导出：$$\\begin{array}{l} \n",
    " \\mu_k^{(i+1)} = \\frac{\\displaystyle \\sum_{j=1}^N \\hat{\\gamma_{jk}} y_j}{\\displaystyle \\sum_{j=1}^N \\hat{\\gamma_{jk}} } \\\\\n",
    "(\\sigma_k^2)^{(i+1)} = \\frac{\\displaystyle \\sum_{j=1}^N \\hat{\\gamma_{jk}} (y_i - \\mu_k)^2}{\\displaystyle \\sum_{j=1}^N \\hat{\\gamma_{jk}} } \\\\  \n",
    "\\displaystyle \\alpha_k^{(i+1)} = \\frac{n_k}{N}= \\frac{\\displaystyle \\sum_{j=1}^N \\hat{\\gamma_{jk}}}{N} \\\\\n",
    "\\end{array}$$其中$\\displaystyle \\hat{\\gamma_{jk}}=E \\gamma_{jk}, n_k = \\sum_{j=1}^N E \\gamma_{jk}, k=1,2,\\cdots,K$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
